-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[ES|QL] add reverse function #113297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] add reverse function #113297
Conversation
Hi @drewdaemon, I've created a changelog YAML for you. |
…n/elasticsearch into 98545/reverse-function
…98545/reverse-function
message:keyword | message_reversed:keyword | ||
😂😁😅😐🥺😢😭 | 😭😢🥺😐😅😁😂 | ||
// end::reverseEmoji-result[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - small comment, I would use different emoticons to better signal visually the reversal: 🏡🚌✈ becomes ✈ 🚌🏡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally add also a string that combines and intertwines regular chars + emoticons + extended ones (こんにちは) and check they are properly reversed (namely that the encoding is preserved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - small comment, I would use different emoticons to better signal visually the reversal
I think of the current ones like a sort of easter egg in the docs... you have to squint a bit and then you're rewarded with "getting it."
That said, I defer to you — LMK
cc @leemthompo in case he has input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Costin's right that it would be easier with a bit more visual variance in the emojis but we're knee deep in nit territory here 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Liam this is the most important conversation going on in this PR right now 😆
Okay, I'll take another stab at emoji art 🖼️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My battle for these emojis is met with opposition at every turn... first it was our handling of unicode graphemes and now it's a custom font on the docs website... will I prevail?....
Screen.Recording.2024-09-26.at.4.01.39.PM.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
def(Locate.class, Locate::new, "locate"), | ||
def(Repeat.class, Repeat::new, "repeat"), | ||
def(Space.class, Space::new, "space") }, | ||
def(Trim.class, Trim::new, "trim") }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sorting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All JetBrains :)
…98545/reverse-function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor/optional drive-by comments only.
return reversed.toString(); | ||
} | ||
|
||
private static boolean isOneByteUTF8(BytesRef ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be isASCII()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood ASCII vs unicode to be distinct from the UTF encoding so I tried to be specific.
Let me know if I misunderstood!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used words like "low ascii" or "one bytes utf-8" for the same thing. Either is fine by me. But in grand java tradition - what you've written is quite descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe isUTF8ASCII
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find isOneByteUTF8
clearer than other alternatives.
UTF8 encoding code points corresponding to ASCII on one byte is known, but maybe not "well enough" known. Let's leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isOneByteUTF8
the best, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool cool
private static void reverseArray(byte[] array, int start, int end) { | ||
while (start < end) { | ||
byte temp = array[start]; | ||
array[start] = array[end]; | ||
array[end] = temp; | ||
start++; | ||
end--; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add this back to org.elasticsearch.common.util.ArrayUtils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 6c09607
if (isOneByteUTF8(val)) { | ||
// this is the fast path. we know we can just reverse the bytes. | ||
BytesRef reversed = BytesRef.deepCopyOf(val); | ||
reverseArray(reversed.bytes, reversed.offset, reversed.offset + reversed.length - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scans an ASCII text for three times (even though not necessarily byte-by-byte).
I guess the first pass could attempt a reverse already, though that would imply allocation for non-ASCII text.
I guess it's fine as is, probably not making a practical difference either way, can be optimised if ever needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. At least we're in O(n).
can be optimised if ever needed
Agreed
Hmmm, these failing tests appear to be a Catch-22 centered on this CSV test. When I run I'm not really sure what to do about this. Could someone take a look? |
That's the best catch. I'll look. |
Okay, now it looks like the warning returned in the multi-node scenario is not identifying the line and column number correctly:
In the CSV test, the warning is |
Ah! That's you actually! I'll link. |
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
out.writeNamedWriteable(field()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add source.writeTo(out)
and read it with this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class)
. Then you'll need to flip the test - it is setup with that alwaysempty
thing.
After that the source should appear.
This is a left over from when serializing Source
was really expensive. Back then we didn't serialize it. We should serialize it because it's really cheap now - just a few bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized UnaryScalarFunction
already takes care of this, so I simplified the constructor in ad7c68f
…98545/reverse-function
…98545/reverse-function
💚 Backport successful
|
Adds a REVERSE string function
Adds a REVERSE string function
Part of #98545
Adds a
REVERSE
function to the ES|QL language.REVERSE
reverses the characters in a string. If you want to know more, read the docs!